home *** CD-ROM | disk | FTP | other *** search
- Path: news.eunet.fi!fipnet!kone!jsaarinen
- Newsgroups: comp.sys.amiga.programmer
- X-NewsReader: IntuiNews 1.2b (31.7.94)
- References: <38232464@kone.fipnet.fi> <4ga21v$lsk@brachio.zrz.TU-Berlin.DE>
- From: "Jyrki Saarinen" <jsaarinen@kone.fipnet.fi>
- Date: Wed, 21 Feb 96 18:00:40 UT
- Comments: Illegal date header - new date added by quicknews
- X-Original-Date: Wed, 21 Feb 96 13:01:37
- MIME-Version: 1.0
- Content-Type: text/plain; charset=iso-8859-1
- Content-Transfer-Encoding: binary
- Subject: Re: Texture/Gouraud innerloop speedtests
- Message-ID: <38232562@kone.fipnet.fi>
-
-
- > >Ok, I did a little research. My CPU is a 40MHz 68040,
- > >a Warp Engine with a very fast memory system, maybe
- > >this is the reason I did not gain any speed even if
- > >I turned the data cache and thus data burst off,
- > >with data burst everything was about 50% slower.
- >
- > Not very surprising! Data burst means that whenever
- > a cache-miss occurs the CPU loads 4 longwords around
- > the mem area where the data to be fetched is. For a
- > tmapping loop this means that for almost any pixel that
- > is fetched from the texture the CPU keeps the bus busy
- > for 4 mem cycles!
-
- ;) I said 50% slower when I switched the data cache OFF.
-
- > >So the frame rates were for a 320x256 screen:
- > >Texture/Gouraud/Shading table, 64k aligned: ~43 fps
- > >Plain Texture, 64k aligned: ~67 fps
- >
- > fps? Are these figures for the mere repetition (320*256 times)
- > of the innerloop?
-
- Yep.
-
- > > move.b (a3,d0.l),d1
- > > move.b (a4,d1.l),(a0)+
- > [...]
- > > dbf d7,poly
- > > rts
- >
- > If I understand your problem right you wonder why the
- > two version are almost equal in terms of speed? The scheduling
- > is not optimal in both versions, you use the data that you
- > fetch in the next instruction.
-
- If you have read my posting .. I said I could not speed
- up the "normal index version" of the routine, I tried
- all the possible instruction combinations.
-
- Besides, the 64k-aligned routine is ~20% faster. And it is
- properly scheduled at least on the 040, I could gain about
- ~10-15% by changing instructions to that order where
- they are now.
-
- -- _
- a Stellar programmer _ //
- "Amiga - back for the future" \X/
-